51 research outputs found

    SURVIVAL ANALYSIS AND LENGTH-BIASED SAMPLING

    Get PDF
    When survival data are colleted as part of a prevalent cohort study, the recruited cases have already experienced their initiating event. These prevalent cases are then followed for a fixed period of time at the end of which the subjects will either have failed or have been censored. When interests lies in estimating the survival distribution, from onset, of subjects with the disease, one must take into account that the survival times of the cases in a prevalent cohort study are left truncated. When it is possible to assume that there has not been any epidemic of the disease over the past period of time that covers the onset times of the subjects, one may assume that the underlying incidence process that generates the initiating event times is a stationary Poisson process. Under such assumption, the survival times of the recruited subjects are called “lengthbiased”. I discuss the challenges one is faced with in analyzing these type of data. To address the theoretical aspects of the work, I present asymptotic results for the NPMLE of the length-biased as well as the unbiased survival distribution. I also discuss estimating the unbiased survival function using only the follow-up time. This addresses the case that the onset times are either unknown or known with uncertainty. Some of our most recent work and open questions will be presented. These include some aspects of analysis of covariates, strong approximation, functional LIL and density estimation under length-biased sampling with right censoring. The results will be illustrated with survival data from patients with dementia, collected as part of the Canadian Study of Health and Aging (CSHA)

    Large-sample study of the kernel density estimators under multiplicative censoring

    Full text link
    The multiplicative censoring model introduced in Vardi [Biometrika 76 (1989) 751--761] is an incomplete data problem whereby two independent samples from the lifetime distribution GG, Xm=(X1,...,Xm)\mathcal{X}_m=(X_1,...,X_m) and Zn=(Z1,...,Zn)\mathcal{Z}_n=(Z_1,...,Z_n), are observed subject to a form of coarsening. Specifically, sample Xm\mathcal{X}_m is fully observed while Yn=(Y1,...,Yn)\mathcal{Y}_n=(Y_1,...,Y_n) is observed instead of Zn\mathcal{Z}_n, where Yi=UiZiY_i=U_iZ_i and (U1,...,Un)(U_1,...,U_n) is an independent sample from the standard uniform distribution. Vardi [Biometrika 76 (1989) 751--761] showed that this model unifies several important statistical problems, such as the deconvolution of an exponential random variable, estimation under a decreasing density constraint and an estimation problem in renewal processes. In this paper, we establish the large-sample properties of kernel density estimators under the multiplicative censoring model. We first construct a strong approximation for the process k(G^G)\sqrt{k}(\hat{G}-G), where G^\hat{G} is a solution of the nonparametric score equation based on (Xm,Yn)(\mathcal{X}_m,\mathcal{Y}_n), and k=m+nk=m+n is the total sample size. Using this strong approximation and a result on the global modulus of continuity, we establish conditions for the strong uniform consistency of kernel density estimators. We also make use of this strong approximation to study the weak convergence and integrated squared error properties of these estimators. We conclude by extending our results to the setting of length-biased sampling.Comment: Published in at http://dx.doi.org/10.1214/11-AOS954 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Change-point Problem and Regression: An Annotated Bibliography

    Get PDF
    The problems of identifying changes at unknown times and of estimating the location of changes in stochastic processes are referred to as the change-point problem or, in the Eastern literature, as disorder . The change-point problem, first introduced in the quality control context, has since developed into a fundamental problem in the areas of statistical control theory, stationarity of a stochastic process, estimation of the current position of a time series, testing and estimation of change in the patterns of a regression model, and most recently in the comparison and matching of DNA sequences in microarray data analysis. Numerous methodological approaches have been implemented in examining change-point models. Maximum-likelihood estimation, Bayesian estimation, isotonic regression, piecewise regression, quasi-likelihood and non-parametric regression are among the methods which have been applied to resolving challenges in change-point problems. Grid-searching approaches have also been used to examine the change-point problem. Statistical analysis of change-point problems depends on the method of data collection. If the data collection is ongoing until some random time, then the appropriate statistical procedure is called sequential. If, however, a large finite set of data is collected with the purpose of determining if at least one change-point occurred, then this may be referred to as non-sequential. Not surprisingly, both the former and the latter have a rich literature with much of the earlier work focusing on sequential methods inspired by applications in quality control for industrial processes. In the regression literature, the change-point model is also referred to as two- or multiple-phase regression, switching regression, segmented regression, two-stage least squares (Shaban, 1980), or broken-line regression. The area of the change-point problem has been the subject of intensive research in the past half-century. The subject has evolved considerably and found applications in many different areas. It seems rather impossible to summarize all of the research carried out over the past 50 years on the change-point problem. We have therefore confined ourselves to those articles on change-point problems which pertain to regression. The important branch of sequential procedures in change-point problems has been left out entirely. We refer the readers to the seminal review papers by Lai (1995, 2001). The so called structural change models, which occupy a considerable portion of the research in the area of change-point, particularly among econometricians, have not been fully considered. We refer the reader to Perron (2005) for an updated review in this area. Articles on change-point in time series are considered only if the methodologies presented in the paper pertain to regression analysis

    SURVIVAL ANALYSIS AND LENGTH-BIASED SAMPLING

    Get PDF
    When survival data are colleted as part of a prevalent cohort study, the recruited cases have already experienced their initiating event. These prevalent cases are then followed for a fixed period of time at the end of which the subjects will either have failed or have been censored. When interests lies in estimating the survival distribution, from onset, of subjects with the disease, one must take into account that the survival times of the cases in a prevalent cohort study are left truncated. When it is possible to assume that there has not been any epidemic of the disease over the past period of time that covers the onset times of the subjects, one may assume that the underlying incidence process that generates the initiating event times is a stationary Poisson process. Under such assumption, the survival times of the recruited subjects are called “lengthbiased”. I discuss the challenges one is faced with in analyzing these type of data. To address the theoretical aspects of the work, I present asymptotic results for the NPMLE of the length-biased as well as the unbiased survival distribution. I also discuss estimating the unbiased survival function using only the follow-up time. This addresses the case that the onset times are either unknown or known with uncertainty. Some of our most recent work and open questions will be presented. These include some aspects of analysis of covariates, strong approximation, functional LIL and density estimation under length-biased sampling with right censoring. The results will be illustrated with survival data from patients with dementia, collected as part of the Canadian Study of Health and Aging (CSHA)

    Survey of the diagenesis process and effect these process on reservoir quality of the Kangan formation in South Pars Field

    Get PDF
    Kangan and Upper Dalan formations are forming reservoir sequence of the South Pars field. Litology of the kangan formation is carbonate and (limestone and dolomite with anhydrite intervals), so digenetic processes were very active in this formation and these processes have changeed the reservoir quality. Importances of the digenetic process are including dissolution, calcite cements, dolomitization, anhydritization, physical and chemical compaction and fracturing

    Improving Convergence for Nonconvex Composite Programming

    Full text link
    High-dimensional nonconvex problems are popular in today's machine learning and statistical genetics research. Recently, Ghadimi and Lan \cite{Ghadimi} proposed an algorithm to optimize nonconvex high-dimensional problems. There are several parameters in their algorithm that are to be set before running the algorithm. It is not trivial how to choose these parameters nor there is, to the best of our knowledge, an explicit rule how to select the parameters to make the algorithm converges faster. We analyze Ghadimi and Lan's algorithm to gain an interpretation based on the inequality constraints for convergence and the upper bound for the norm of the gradient analogue. Our interpretation of their algorithm suggests this to be a damped Nesterov's acceleration scheme. Based on this, we propose an approach on how to select the parameters to improve convergence of the algorithm. Our numerical studies using high-dimensional nonconvex sparse learning problems, motivated by image denoising and statistical genetics applications, show that convergence can be made, on average, considerably faster than that of the conventional ISTA algorithm for such optimization problems with over 1000010000 variables should the parameters be chosen using our proposed approach.Comment: 10 pages, 2 figure
    corecore